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Entropic and displacement interpolation: 
a computational approach using the Hilbert metric 

Yongxin Chen, Tryphon T. Georgiou and Michele Pavon 


Abstract 

Monge-Kantorovich optimal mass transport (OMT) provides a blueprint for geometries in the space of positive 
densities - it quantifies the cost of transporting a mass distribution into another. In particular, it provides natural 
options for interpolation of distributions (displacement interpolation) and for modeling flows. As such it has been 
the cornerstone of recent developments in physics, probability theory, image processing, time-series analysis, and 
several other fields. In spite of extensive work and theoretical developments, the computation of OMT for large 
scale problems has remained a challenging task. An alternative framework for interpolating distributions, rooted in 
statistical mechanics and large deviations, is that of Schrodinger bridges (entropic interpolation). This may be seen 
as a stochastic regularization of OMT and can be cast as the stochastic control problem of steering the probability 
density of the state-vector of a dynamical system between two marginals. In this approach, however, the actual 
computation of flows had hardly received any attention. In recent work on Schrodinger bridges for Markov chains 
and quantum evolutions, we noted that the solution can be efficiently obtained from the fixed-point of a map which 
is contractive in the Hilbert metric. Thus, the purpose of this paper is to show that a similar approach can be taken 
in the context of diffusion processes which i) leads to a new proof of a classical result on Schrodinger bridges 
and ii) provides an efficient computational scheme for both, Schrodinger bridges and OMT. We illustrate this new 
computational approach by obtaining interpolation of densities in representative examples such as interpolation of 
images. 


Index Terms 

Optimal mass transport, Schrodinger bridges, Hilbert metric, interpolation of densities, image morphing 
AMS Classification: 47H07, 47H09, 60J25, 34A34, 49J20 


I. Introduction 

The present paper is concerned with the problem of interpolating densities. Linear interpolation, which 
is widely used in signal analysis and image processing, has several drawbacks including fade-in fade-out 
effects, see e.g., [J2J, [[24|. For this reason other methods have been pursued. The purpose of this work is 
to relate three relevant subjects, optimal mass transport, Schrodinger bridges, and the Hilbert metric, that 
provide a geometric framework for interpolation of densities. All these three subjects have a long and 
notable history. However, the connection between all three represents a recent development. 

Among our three subjects, the one with the longest history is that of Monge-Kantorovich Optimal Mass 
Transport (OMT) originating in the work of Gaspard Monge in 1781 [[331 . After more than a century of 
attempts by renowned mathematicians, the problem of transferring a mass from a starting distribution to 
a final one, while incurring minimal cost, found its modem formulation and solution in the 1940’s: 
Leonid Kantorovich introduced duality and linear programming to address specifically this problem. 
In the past twenty years, Ambrosio, Benamou, Brenier, McCann, Gangbo, Jordan, Kinderlehrer, Otto, 
Rachev, Riischendorf, Villani and several others drew connections with gradient flows, the heat equation, 
logarithmic Sobolev inequalities, natural geodesic structures for interpolating densities (displacement 
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interpolation ), all with applications to several topics of great physical and engineering importance; see 
lfI9l , POl , P51 . PI , IfTM , 1401 , 14I1 . 11351 . Besides the intrinsic importance of optimal mass transport to 
the geometry of spaces and the multitude of applications, a significant impetus for recent work has been 
the need for effective computation 0, 12, ® which is often challenging. 

Historically, our next subject, Hilbert’s projective metric, was introduced in 1895 l22l . In the 1950’s 
Garrett Birkhoff and Hans Samelson realized the significance of the metric in proving the existence of 
positive eigenvectors to linear operators that leave a cone invariant via a contraction principle, see 1271 . 
Birkhoff’s version of the Hilbert metric was further developed for nonlinear operators and popularized by 
Bu shell 0; for an overview and a historical account of key developments on the subject see [j26l , [|20l . 
Our usage of the Hilbert metric will be in the same spirit, to ascertain and also construct solutions to a 
system of equations using a contraction mapping principle. 

Our third subject, commonly referred to as the Schrodinger bridge problem, pertains to steering the 
density of particles from an initial to a terminal one-time distribution with minimal energy. The original 
formulation due to Schrodinger in 1931/32 lf38l . P9l aimed at explaining the conundrum of quantum 
mechanics by providing a classical stochastic reformulation (a task partially achieved by Nelson 041 35 
years later). Schrodinger’s original question was not phrased in terms of “steering,” but as the question to 
identify the most likely flow of the density of particles between two points in time where their distribution 
is known ll42l ; see also m, a, m for early important contributions. It only later became clear that 
Schrodinger’s question can also be posed as a minimum energy steering between marginals lH41l . llT5l . 
This, in itself, has led to developments of engineering significance |[TT|. [fl2|, ll9l . Flows of densities 
between end-point marginals specified by Schrodinger bridges provide what is referred to as entropic 
interpolation ||28l , 11291 and can be seen as a “regularized” counterpart of OMT geodesics. Recent insights 
in the above references have extended and broadened the scope of Schrodinger bridges to problems of 
great engineering interest. 

The structure of our paper and the connection between all three topics will proceed as follows. We 
first explain the topic of Monge-Kantorovich optimal mass transport for the special case of quadratic 
cost. We then briefly review Schrodinger bridges and explain how Monge-Kantorovich OMT is a limiting 
case when stochastic excitation goes to zero, following Mikami, Thieullen, and Leonard Oil . Oil . Il28l . 
Il29l . Finally, we bring in the Hilbert metric, and exploit contractiveness of certain maps that underly 
Schrodinger’s problem to construct solutions. In the process, we give an independent proof of a classical 
result due to Jamison on the existence of solutions to Schrodinger systems. The numerics and the speed 
of convergence hold the promise that this may provide an attractive approach to solving general OMT 
problems and displacement interpolation. It certainly provides a numerical approach to Schrodinger bridge 
problems and entropic interpolation, that has not received much attention in the past. 

II. Background 

We briefly discuss the three topics of interest and the links between them. 


A. Optimal mass transport 

Gaspar Monge’s formulation begins with two non-negative measures po, pi on W 1 having equal total 
mass. Throughout, these will be probability measures; when absolutely continuous we denote by pi(x) 
(i G {0,1}) the corresponding densities, i.e., pi(dx) = pi{x)dx. The OMT problem seeks a transportation 
plan (that is, a measurable map) T from M n —y M n so that 

Pi(-) = PoiT-'i-)) 


( 1 ) 
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and the cost of transportation 

[ h\x ~T(x)\\ 2 iJ,o(dx) (2) 

is minimally] The composition /i 0 (T _1 (-)) is commonly referred to as the “push-forward” and denoted by 
Tfl/x 0 . Further, for the problem to be well-posed the distributions must have finite second moments. 


In Kantorovich’s formulation po, pi are thought of as marginals on the product space M" x R n , and 
the transportation cost becomes 




y\\ 2 7r(dxdy) 


(3) 


where it, refered to as a “coupling” between /i 0 and p\, is a measure on the product space with marginals 
along the two coordinate directions equal to p 0 and // 3 , respectively—the space of all such couplings will 
be denoted by n(/x 0 , A*i)- 


The subtle technical challenge that impeded satisfactory solutions of Monge’s problem for over a 
century is the fact that an optimal transport plan T may not exist in general. When one exists, the optimal 
coupling has support on the graph of T, and this is the case when the two marginals /i 0 and p i are 
absolutely continuous |[40ll . 


The existence of transportation map T provides a way to connect p 0 and //, through (McCann’s) 
displacement interpolation 

Pt(dx) — ((1 t)x + tT(x))$p 0 (dx), 0 < t < 1. (4) 

This family of distributions, assuming T is an optimal transportation plan, is in fact a geodesic in the space 
of distributions when metrized by the Wasserstein 2-metric fiTil . [f36l . [[281 , [fl9l . We note in passing that 
interest in interpolating distributions arises in physical modeling, resource allocation, spectral analysis, 
filtering and smoothing of time-series, etc., see e.g. [|251 . [fl6l . [f35ll . 


B. Schrodinger bridges 


Schrodinger [1381 , [|39l in 1931/32 sought to reconcile empirical observations of diffusive particles with 
known priors. More specifically he considered a large number of i.i.d. Brownian particles transitioning 
between two empirical distributions p 0 (x)dx and pi(x)dx at two end-points in time, t = 0 and t — 1, 
respectively. He hypothesized that pi(x) considerably differs from what we expect according to the law 
of large numbers, namely 


q(0,y,l,x)p 0 (y)dy, 


where 


q(s,y,t,x ) 


(2vr)- ?l/2 (/-s)- ri/2 exp 


( l|k-//ll 2 \ 
V 2 (t-s) ) 


denotes the Brownian transition probability kernel, and sought to determine the “most likely” evolution of 
the cloud of particles from p 0 to p\ . In the language of large deviations, Schrodinger’s question amounts 
to seeking a probability law on path space that agrees with the observed empirical marginals and is closest 
to the prior law of the Brownian diffusion in the sense of relative entropy ifTTIl . 


Schrodinger’s solution (called Schrodinger bridge), which was formally established by Fortet [fl8l and 
later on generalized by Beurling [@J and Jamison ll23ll to more general reference measures, is a follows 
(see also Follmer [fTTl f. 


'G. Monge's original formulation sought to minimize 
also draws on much richer a theory. 


f ||a: — T(x)\\fio(dx) instead of the present work focuses on the latter which 
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Theorem 1: Given two probability measures p 0 (dx) = p 0 (x)dx and // 1 (dx) = p\ (x)dx on M" and the 
Markov kernel q(s , y , t , x), there exists a unique pair of cr-finite measures ( <p 0 (x)dx , ^(x^x) on such 
that the measure P 01 on W 1 x M n defined by 


Voi(E) = / g(0,y, l,x)< 1 3o(2/)< / 9i(x)d|/dx (5) 

J E 

has marginals // 0 and p i. Furthermore, The Schrodinger bridge between // 0 and p, 1 has as one-time marginal 


V t {dx) = tp(t,x)p(t,x)dx 

(6) 

at time t, with 


<p(t,x) = 1 q(t,x,l,y)tp 1 (y)dy 

(7a) 

0(t,x) = j q{0,y,t,x)ip 0 {y)dy. 

(7b) 

Given marginal distributions p 0 (dx) = p 0 (x)dx and pfidx) = pi(x)dx. 

the distribution flow ([6]) is 


referred to as the entropic interpolation with prior q between p () and p i, or simply entropic interpolation 
when it is clear what the Markov kernel q is. The pair (</? 0 (-)> <^i(-)) = (<£(0, •), <^(1, •)) is the key to 
the Schrodinger problem; it is the unique solution to the so-called Schrodinger system which consists of 
the two linear equations ( [7a] ) with t = 0 and ( |7b] ) with t — 1, together with the two nonlinear coupling 
conditions 

Pt(x) = ip{t,x)<p{t,x), t = 0,1. (8) 

Its existence and uniqueness have been studied for a long time |[T8l , [j4l , [|23l , IfTTlI . 


C. The Hilbert metric 


The Hilbert metric was introduced by David Hilbert in 1895 Il22ll . The form that the metric takes 
to quantify distance between rays in positive cones, as used herein, is due to Garrett Birkhoff [0. The 
importance of the metric and subsequent developments are being discussed in 0. See also a recent survey 
on its applications in analysis ll27l . The Hilbert metric and certain key facts are presented next. 

Definition 2: Let S be a real Banach space and let 1C be a closed solid cone (with nonempty interior 
/C + ) in S, i.e., AC is a closed subset of S with the properties (i) AC + AC C AC, (ii) ctAC C AC for all real 
a > 0 and (iii) AC D —AC = {0}. The cone AC induces a partial order relation ^ in S 

x ^ y ^ y — xgAC. 


For any x,y E AC + := AC\{0}, define 


with the convention inf 0 


M(x,y) := inf{A | x p< A y}, 
7n(x,y) := sup{A | Xy ■< x} 


oo. Then the Hilbert metric is defined on AC + by 


d H {x,y) := log 


( M(x,y) \ 

\™(x,y) J ' 


Strictly speaking, the Hilbert metric is a projective metric since it remains invariant under scaling by 
positive constants, i.e., d H (x,y ) = d H (Xx,y) = d H (x,Xy) for any A > 0 and, thus, it actually measures 
distance between rays and not elements. 
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A map £ : /C —» JC is said to be positive if £ : /C + —> /C + . For such a map define its projective diameter 

A(£) := sup {d H (£(x),£(y)) \ x,y G /C + } 

and the contraction ratio 

k(£) := inf {A | cta(£ (a;), £(?/)) < A d H (x,y) Vx, y G /C + } 

For a positive map £ which is also linear, we have k{£) < 1. In fact, Birkhoff’s theorem gives the relation 
between A(£) and k(£) as 

k{£) = tanh(-A(£)), (9) 

see m, na. Also, another important relation that needs to be pointed out is 

A(£) < 2sup {cIh{£{x) : 1) | x G /C + }. 

This follows directly from the definition of A(£) and the triangular inequality 

d H (£(x),£(y)) < d H (£(x), 1) + d H (l,£(y)), x, y G K + . 


D. Connections between the three topics 

The Monge-Kantorovich problem ([3]) turns out to be the limit of the Schrodinger problem as the 
diffusion coefficient of the reference Brownian motion tends to zero EO, [|32l . Il28l . [|29l , lITOl . Ifl3l . The 
precise statement follows. 

Theorem 3: Let p 0 (dx) = p 0 (x)dx, pi(dx) = pi(x)dx be two probability measures on M n with finite 
second moment, V e be the solution of the Schrodinger problem with Markov kernel 

q e (s,y,t,x) = (2vr )~ n/2 ((t - s)e)" n/2 exp - s)e ) 

and marginals // (J and p\, and tt be the solution to the Monge-Kantorovich problem ([3]) with the same 
marginal distributions. Then weakly converges to tt as e goes to 0. Moveover, the entropic interpolation 
Vf also weakly converge to the displacement interpolation /// . 

Thus, according to Theorem [3j the displacement interpolation between given marginals can be approx¬ 
imated by their entropic interpolation for a small enough diffusion coefficient. 

For both, the Monge-Kantorovich problem as well as the Schrodinger bridge problem, the main objects 
are probability distributions. These are nonnegative, by definition, and thereby belong to a convex set 
(simplex) or a cone (if we dispense of the normalization). According to Birkhoff’s theorem ([9]), as we 
noted, linear endomorphisms of a positive cone are contractive; a fact which is often the key in obtaining 
solutions of corresponding equations. Thus, the geometry underlying both problems is expected to be 
impacted by endowing distributions with a suitable version of the Hilbert metric. This is done in the next 
section. 


III. Schrodinger system redux 

Herein, we provide a new proof for the existence and uniqueness of the solution to the Schrodinger 
system. To this end, we show that the solution (p, p) to the Schrodinger system of equations is the unique 
fixed point of a contraction in the Hilbert metric. Thereby, we also obtain an efficient algorithm to solve 
the Schrodinger system. We begin by considering p 0 and f>\ having compact support. 
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Proposition 1: Suppose that, for i G {0,1}, S) C M n is a compact set, pi(x)dx is absolutely continuous 
probability measure (with respect to the Lebesgue measure) on the cr-ficld E* of Borel sets of Si, and 
that q is an everywhere continuous, strictly positive function on S 0 x Si. Then, there exist nonnegative 
functions 99 ( 0 , •), <0(0, •) defined on S 0 and <p(l, •), <0(1, •) defined on Si satisfying the following so-called 
Schrodinger system of equations: 


<p( 0 ,x) 

= J 

1 q(0,x,l,y)<p{l,y)dy, 

Si 

(10a) 

< 0 ( 1 , x) 

= J 

1 q(0,y,l,x)<p{0,y)dy, 

So 

(10b) 

Po(x) 

00 

o' 

00 

00 

cT 

00 

(10c) 

Pl(x) 


(10d) 


Moreover, this solution is unique up to multiplication of <p(0, •) and 99 ( 1 , •) and division of 0(0, •) and 
< 0 ( 1 , •) by the same positive constant. 


In order to study the Schrodinger system ( fTO] ) we consider 



£ : 

p{l,xi) i->- (p(0,xo) = / q($,x Q ,l,xi)y(l,xi)dxi 

J Si 

(11a) 


£ ] : 

(0(0, X 0 ) f-G (0(1, Xi) = / 9 ( 0 , (Co, 1, £i)<0(O, Xo)dXo 

j So 

(lib) 


: 

(p(0,x o ) 1 —>■ (0(0, (Co) = po(xo)/ <p(0, xo) 

(He) 


22pi ■ 

<0(1, Xi) I— >• <p(l,(Ci) = pi ((Ci) / <0(1, (Cl). 

(Hd) 

We also define 


£?(5) := U^ms) | f(x) > £,Vi G S}, 


for e > 0, and 





r + (S) := U £?(S), 


£>0 

for p G {1, 2, cx)} and S G {S'o, 5) }, and endow C°£(S) with the Hilbert metric with respect to the natural 
partial order of inequality between elements (almost everywhere). 

Since the kernel q is positive and continuous on the compact set S'o x Si, it is bounded from below 
and above, i.e., there exist 0 < a < (3 < 00 such that 


cx < 5 ( 0 , X, 1, y) < 13, V(x, y) e So X Si. (12) 

It follows that map nonnegative integrable functions (£(,), except the zero function, to functions 
that are strictly positive and bounded (£2°). Conversely, since p 0 and p\ are nonnegative and integrate to 
1 (though, possibly unbounded), T> po ,V pi map positive and bounded functions (£{°) to nonnegative and 
integrable ones (£j). Thus, the Schrodinger system relates to the following circular diagram 


<0(0, X 0 ) 

3 

y PO 

¥>(0,z 0 ) 




<0(1,Xi) 

dd pi 


p{l,Xi) 
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where 99(0, x 0 ), 0 ( 1 , £1) e Cf, while 99(1, £1), 0 ( 0 , x 0 ) E C g on the corresponding domains So , Si , i.e., 
the circular diagram provides a correspondence between spaces as follows, 


£h(s 0 ) 

cr(So) 


®pi 


We will focus on the composition C := £t o V po o £ o V pi , that is, 


C: C+{Si) -> ££(Si) 

\ £t °A>o o£oI V’i l ,^1 u 

: 99(1, Xi) -> (^(l,iCl))next 


and establish the following key lemma. 

Lemma 4: The contraction ratio for C is n(C) < 1. 

Proof: When using the Hilbert metric it is important to work on convex sets or cones with non-empty 
interior, cf. Definition [2j We note that this is not the case with C {) as it has an empty interior. Interestingly, 
this apparent difficulty in analyzing C = C o 'D P(j o £ oV pl can be circumvented if we factor C in a slightly 
different manner so that the factors map Cf into itself. It is noted that Cf is indeed a positive cone 
satisfying the condition (non-empty interior) in the definition of the Hilbert metric. 


To this end, we first define 

T> ■ f(x) ^ 


(13a) 


For simplicity of notation we use the same symbol V for the inversion of functions on either So or Si. 
Then, define 


£ • 
‘-'pi ■ 

g(x 1 ) 

/ q(Q,Xo,l,Xi)pi{;Xi)g(xi)dxi 
hi 

(13b) 

£ ] ■ 
°pi • 

h(x 0 ) 

/ q(0,x o ,l,x 1 )p o (x o )h(x o )dx o . 

JSo 

(13c) 


In this way, C can instead be expressed as the composition 

C = £loV°£ pi ° V, 

where now all operators map Cf to itself (albeit, with possibly different support, S 0 or Si). Accordingly, 
the circular diagram can be redrawn as follows: 


h(x 0 ) 

V 

£( 0 ,X 0 ) 


■> 0 ( 1 , Zl) 

V 

- g{xi). 


We now show that 

£ pi oV : Cf(Si) —>■ Cf(So) 

is strictly contractive in the Hilbert metric. The fact that, C oV : Cf(S 0 ) —* Cf(Si) is also contractive 
proceeds similarly. 

Consider two functions 0^(1, •), E Cf(S 1 ), for i E {1, 2}, and let 

9i(-) = ^(0i(!,O) = 0<( 1 >-)~ 1 , 

£i(0, •) = £ pi (gf-)). 











Since M(gi,g 2 ) = (m(g 1 1 ,g 2 x )) 1 , it follows that D is an isometry in the Hilbert metric. Next consider 
the map £ pi . The projective diameter of £ Pl is 


A(£ P1 ) < 2swp{d H (£ P1 (g),l) I 9 e £+(Si)}- 


But since for any g e £) 


a / pi(x 1 )g(x 1 )dxi < £ Pl (g) < f3 / p 1 (x 1 )g(x 1 )dxi, 


'S i 


'Si 


A(£ Pl ) < 2 sup {log ( 


supgpi(fl) . 
inf £ Pl (p) ' 


geCZW} 


< 21og( —) < oo. 
a 


Hence, 


Similarly we have 


k(£ Pi oV) = tanh(-A(£ pl )) 

< tanh(Jlog(-)) < 1. 
2 a 


«(4 oV)< tanh(^ log(-)) < 1. 


a 


Combining the above we have that n(C) < tanlr (| log(^)) <1. ■ 

Proof of Proposition |7j- Let f> 0 (1, ■) be any function in Cf{S\ ) and consider the sequence 

fk( 1 , ■) = C k <fo( 1 , •), 

for k = 1,2,.... This is Cauchy with respect to the Hilbert metric in because C is strictly 

contractive. We normalize so as to obtain the unit-norm sequence 

9 k(x) = , k = 1 , 2 ,..., 

which can be done since C°°(Si) C C 2 (Si). The sequence {g k } is also Cauchy with respect to the Hilbert 
metric in 


E = {/ G £+(Si) 


= li¬ 


lt is easy to see from ( fl2l ) that 
Since ||g fe || 2 = 1 we have 


sup X {g k (x)) < P 
mi x (g k (x)) ~ a 


sup(g k (x)) > 


y/\Sl 


> inf (g k (x)), 


(14) 


(15) 


where | Sf denotes the Lebesgue measure of ,S’,. Combining ( |T4j ) and ( [15] ) we easily deduce that {g k } is 
uniformly bounded from both, above and below; more precisely, 


oi B 

sup (g k (x)) < t= , while inf (g k {x)) > 


* pVWu 

For any two elements g k , g e of the sequence, 

m(g k ,g e ) < 1 < M(g k ,g e ), 


a 


vW 


(16) 
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and therefore 


II 9k - 5^||2 — 


(. 9k{x ) - ge(x)) 2 dx 


'Si 


1/2 


(W& - 1 ) 2 gt{xfdx 


'Si 


1/2 


< 


( M{g k , g e ) - m(g k , ge)) 2 g t (x) 2 dx 


'Si 


1/2 


= M(g k ,g e ) - m(g k ,g e ) 

= {exp(d H (g k , ge)) ~ l}m(g k , ge) 
< exp(d H (g k ,g e ))-l. 


It follows that {///J is also Cauchy in the 2-norm, and thus, converges to a unique // e £ 2 (Sj) which 
satisfies ||p || 2 = 1 as well as inherits the bounds in ( fl6| ). Therefore, g G £+ (Si). 


We next show that {g k } converges to g with respect to the Hilbert metric in E as well, i.e., that 
d H (gk,g) —* 0 as k —>■ oo. Since q is uniformly continuous, given any £ > 0 there exists a 5 > 0 such 


that 

|g(0,2/,l,xi) -g(0,y,l,x 2 )| < aey/jsT 


for any Haq — x 2 \\ < S. It follows that for each k, 


< 

< 


l^fc(^i) - 9k(x 2 )| = 

1 


|^ fc (l,a?i) - ^( 1 ,^ 2 )| 


|| < ^’fc(l; ') || 2 J R" 

1 

«vWll^(°>-)ll 


Il ( i2’fc( 1 ) •)lb 
k( 0 , y, 1 , xi) - q( 0 , y, 1 , x 2 ) |^( 0 , y)dy 


-||£fc( 0 , •)lli Qte 's/i^J = 


which shows that g k is uniformly continuous. Moreover, because the value of 5 doesn’t depend on k, {g k } 
is in fact uniformly equicontinuous. Considering that this sequence is also uniformly bounded ( fT 6 | ), we 
apply the Arzela-Ascoli theorem 11371 to deduce that it has a uniformly convergent subsequence {g nk }- 
On the other hand, we already know that {g k } converges to g in the 2-norm. Hence, { g rik } converges 
uniformly to g and therefore g is uniformly continuous. This implies \\g nk /g — 11| 00 —> 0, from which 
dfr(gn k - g) —> 0 follows easily. Combining this with the fact that {g k } is Cauchy with respect to the 
Hilbert metric we conclude that dn(gk,g) —>■ 0. 


Now let C be the projection of C onto E, namely, C(f ) = C(/)/||C(/)|| 2 , then C is contractive in E 
with respect to the Hilbert metric, and it is therefore continuous. It follows that 

g = lim g k+1 = lim C(g k ) = C( lim g k ) = C(g). (17) 

h —^00 /c —>00 fc—>• 00 

Here the limit is taken with respect to the Hilbert metric. Due to the contractiveness of C, the limit g is 
independent of the choice of initial value •) and it is therefore unique. 


We finally argue that g = C(g). Noting that g = C(g), we have C(g) = Xg for some fix A = ||C(g )|| 2 > 0. 
Since (0, T> po (f)) = ||p 0 ||i = 1 for any 0 <G £+ (S'o), we have 

IIPolli = (S °V pi g, V po o£ oV Pl g) 

= (Vpig, £ ] o V po o £ o £> g ) 

V -V-" 

c 

= {Vpig, Xg) 

= A||pi||i, 
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and it follows that A = 1. Now let 

0(V) = 9 
¥>(V) = V pi 9 
¥>( 0,0 = £°' D pi9 

0 (0, 0 = V po o£o V pi 9 , 

then (^(0, •), </?(0, •), <^(1, •), <£(1, ■) is a solution to the Schrodinger system ( flQ| ). The “uniqueness” follows 
from the uniqueness of g. 


Proposition [T] can be rephrased as follows. 

Proposition 2: Suppose S 0 C M r \ .S', C M n are compact sets, p 0 and //1 are absolutely continuous 
probability measures on the cr-fields £ 0 , Si of Borel sets of So and Si, respectively, and that q is an 
everywhere continuous, strictly positive function on S 0 x S\, then there is a unique pair n, v of measures 
on £ 0 x Ei for which 


1 ) 7T is a probability measure and v is a finite product measure; 

2) vr(E 0 x Si) = Po(E 0 ), tr(So x Ef) = pi{Ei), 
for some E 0 G So, E\ G S|; 

3) dTr/dv = q. 


Proof: Apply Proposition [I] to obtain a solution 99 ( 0 , •), <^(0, 
system (jTO]). Then, define v by the formula 


99 ( 1 , •), </?(!, •) to the Schrodinger 


and 7 r by 


v[Eo x Ei) = 


7t(E 0 x Ei) = 


(p{ 0, x)dx 


Eq 


ip(l,x)dx 


'Ei 


0(0, x)g(0, X , 1, 1 /)^( 1 , y)dxdy 


(18) 


(19) 


' Eg X El 


for any E 0 G S 0 , Ei G Si. It is easy to see n and v satisfy all the three properties in the statement. This 
proves the existence part of the proposition. On the other hand, any pair 7r and v satisfying the above 
three requirements can be written as ( |~i~8| ) and ( fl9| ). The uniqueness of the solution 7r and v follows from 
the uniqueness of the solution to the Schrodinger system ( fTd| ) in Proposition |T[ ■ 

Remark 5: It is easy to see that in the above, /i 0 , Pi don’t need to be probability measures, only to 
have equal mass, i.e. po(So) — Pi(Si)- The statements of the propositions hold verbatim. 


The results in the above lemma and proposition can be extended to /i 0 , P\ having non-compact support. 
In fact, Proposition |2] and then, the following extension to general measures (Theorem [6]) were given by 
Jamison in [f23ll . Jamison’s proof of Proposition [2] does not invoke the Hilbert metric or contractiveness. 
The proof of Theorem [6] is reworked in the Appendix for completeness. 

Theorem 6: Suppose p 0 an( l I 1 2 3 1 are absolutely continuous probability measures on the rr-ficld S of 
Borel sets of W 1 , and q is an everywhere continuous, strictly positive function on M n x M n , then there is 
a unique pair 7r, v of measures on S x S for which 


1) 7T is a probability measure and v is a cr-finite product measure; 

2) 7t(E x M n ) = fjbo(E), 7r(M n x E) = pi(E), Eg S; 

3) d7r/du = q. 
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IV. Computational algorithm 


Given marginal probability measures p 0 (dx) = p 0 (x)dx and pi(dx) = pi(x)dx on M", we begin by 
specifying a compact subset S C M n that supports most of the two densities, i.e., such that p 0 (S) and 
Pi(S) are both >1 — 5, for sufficiently small value 8 > 0. We treat the restriction to S for both, after 
normalization so that they integrate to 1, as the end-point marginals for which we wish to construct 
the corresponding entropic and displacement interpolation. Thus, for the purposes of this section and 
subsequent examples/applications both, p 0 and pi are supported on a compact subset S G M n . 


It is important to consider the potential spread of the mass along entropic interpolation and the need for 
S to support the flow without “excessive” constraints at the boundary. Thus, depending on the application 
(imaging, spectral analysis, etc.) a sufficiently conservative choice for S, beyond what is suggested in the 
previous paragraph, may be in order. 

Next, we discretize in space and represent functions <p(i, x), <p(i, x) (i G {0,1}) using (column) vectors 
4>i, (pi, e.g., 4>i(k) = ip(i,x k ) for a choice of sample points x k G S, k — 1 ,..., N and, likewise, p 0 , pi 
(column) vectors representing the sampled values of the two densities. Then, we recast GB as operations 
on these vectors. Accordingly, 


S : 

01 ^ 00 = <501 

(20a) 

: 

00 ^ 01 = QVo 

(20b) 

■ 

00 1 ^ 00 = P0 0 00 

(20c) 


01 01 = Pi 0 01 , 

(20d) 


using the same symbols for the corresponding operators, and using 0 to denote entry-wise division 
between two vectors, i.e., a 0 b := [(!;/&*]. Accordingly, here, 0 represents a matrix. The values of its 
entries depend on the diffusivity e. By iterating the discretized action of C, we obtain a fixed-point pair 
of vectors (0j,0j). From this we can readily construct the entropic interpolation between the marginals 
discretizing ([7]) for intermediate points in time. Since the contraction ratio k(C) < 1, the worst case 
convergence speed is linear. As noted earlier, the displacement interpolation can be approximated by the 
entropic interpolation for small enough e (Theorem [3]). 


O'Pi) 3 — 


We note that, when the noise intensity e is too small, numerical issues may arise due to machine 
precision. One way to mediate this effect, especially regarding ( |20cp0d[ ), is to store and operate with 
the logarithms of elements in Q, Pi,<t>i,<t>i, denoted by ZQ, (p*, Z0j, Z0; (i G {0,1}). More specifically, let 
(IQ)jk = log Qjk and set 

f log(/?i)j if {pi)j > 0, 

| —10000 otherwise, 

(since, e.g., in double precision floating point numbers < 10 -308 are taken to be zero). Carrying out 
operations in ( |20] ) in logarithmic coordinates, V Po becomes 

(l<po)j = (lpo)j ~ (^0o)j> 

and similarly for V Pl . The map becomes 

(/0i) fc = log^exp(/<2 ifc + (l(po)j) 
j 

= M k + log ^ exp (lQ jk + (/0 o )j - M k ), 


(and similarly for S) where M k = rna Xj{lQj k + (, l(po)j }■ In this way the dominant part of the power 
index, which is M k , is respected. 
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Fig. 1: Marginal distributions 


V. Numerical examples 

In this section we study three examples to illustrate the algorithm. The first example is a Schrodinger 
bridge problem on one dimensional space. Right after that is an OMT version of it with the same marginal 
distributions. We observe that as the diffusivity goes to zero, the solution of the Schrodinger bridge problem 
converges to the solution of the OMT problem. The last example highlights interpolation between images 
where computations have been carried out using the above algorithm. 


A. One dimensional Schrodinger bridge 


Consider a collection of Brownian particles in one dimension and let 


and 


P oO) 


0.2 — 0.2cos(37nc) + 0.2 
5 — 5 cos(67nr — 47 t) + 0.2 


if 0 < x < 2/3 
if 2/3 < x < 1, 


Pi(x) = p 0 (l - x), 


represent their distribution at t = 0 and t — 1, respectively; see Figure [T] We seek to construct entropic 
interpolants of the two densities over the interval [0, 1]. We do so by determining Schrodinger bridges 
between p 0 and pi with diffusion kernel 


Q(s,x,t,y) 


1 

a/2vt (t 



(x - y ) 2 ' 


2 (t — s)e 


s <t, 


for a range of values for the diffusion coefficient y/e of the underlying Brownian motion. Figures |2] (a-d) 
depict the corresponding interpolation flow taking y/e — 0.5, 0.2, 0.1, 0.01, respectively. It is worth 
noting that, for larger diffusion coefficient, starting at t = 0, the flow of density spreads out before “re¬ 
assembling” at the other end-point t — 1. We also observe an apparent time-symmetry of the evolution - 
a point that was central to Schrodinger’s thinking already in ll38ll . 
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(a) = 0.5 



(b) Ve = 0.2 



(c) \ft = 0.1 



(d) = 0.01 


Fig. 2: Entropic interpolation 


B. One dimensional OMT 


Consider a collection of particles with the same marginal distributions p 0 and f>\ as the previous 
example. However, instead of doing Brownian motion, we assume the particles are driven by some 
unknown “deterministic” forces. Under this assumption, the “displacement interpolation” based on OMT 
is a reasonable choice. For the monotonicity of the optimal map, a “closed form” solution is available for 
one dimensional OMT problem. The optimal map y = T(x) satisfies 


/ Po(y)dy = 

J —oo 

and the interpolation flow p t , 0 < t < 1 satisfies 

/ Po(y)dy = 


,T(x) 


Pi{y)dy, 


(1 -t)x-\-tT(x) 


Pt{y)dy. 


( 21 a) 


(21b) 


Figure [3] gives the displacement interpolation by using the exact form of the solution in ( |2T[ ). This can 
be compared to the entropic interpolation with yfe = 0.01 (Figure [2jd)); for more detailed comparison 
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Fig. 3: Dispacement interpolation (OMT) 



Fig. 4: Comparison between OMT and Schrodinger bridge 


we plot the corresponding densities at t = 1/2 in Figure |4j From an applications standpoint, it is worth 
noting that entropic interpolation with a “small” (but not insignificant) diffusion coefficient suppresses 
potentially spurious peaks as seen by comparing Figure [3] with Figure [2jc). 


C. Image morphing 

In this subsection, we consider interpolation/morphing of 2D images. When suitably normalized, these 
can be viewed as probability densities on M 2 . Interpolation is important in many applications. One such 
application is Magnetic Resonance Imaging (MRI) where due to cost and time limitations, a limited number 
of slices are scanned. Suitable interpolation between the 2D-slices may yield a better 3D reconstruction. 

Figure [5] shows the two brain images that we seek to interpolate. The data is available as mri.dat 
in Matlab®. Figure [6] compares displacement interpolants at t — 0.2, 0.4, 0.6, 0.8, respectively, based 
on solving a Schrodinger bridge problem with diffusivity e = 0.01 using our numerical algorithm. For 
comparison, we display in Figure [7] another set of interpolants corresponding to larger diffusivity, namely, 
e = 0.04. As expected, we observe a more blurry set of interpolants due to the larger diffusivity. 
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20 40 60 80 100 120 20 40 60 80 100 120 


(a) t = 0 (b) t = 1 

Fig. 5: MRI slices at two different points 



20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 


(a) t = 0.2 (b) t = 0.4 (c) t = 0.6 (d) t = 0.8 

Fig. 6: Interpolation with e = 0.01 


VI. Concluding remarks 

The present paper is concerned with the problem of interpolating distributions. Linear interpolation, 

Ht(dx ) = ((1 — t)po(x) + tpi(x)) dx, for t G [0,1], 

while widely used in signal analysis, is deeply pathological from many different angles. For instance, in 
the context of image processing and signal analysis, linear interpolation creates fade-in fade-out effects. 
To see this consider two Gaussian distributions with the same variance and sufficiently different means. 
Linear interpolation is bi-modal and the two peaks of p t trade-off relative significance as t changes from 
0 to 1. Clearly, this is an undesirable feature. If on the other hand, p t represents the density of particles, 
linear interpolation of one-time marginals requires infinite flow velocity (suggesting teleportation rather 
than a physically realistic flow of the particles). Likewise, this feature is unreasonable on physical grounds. 

Instead, and for those very reasons, other methods of interpolation have been pursued with the geometry 




(b) t = 0.4 


(c) t = 0.6 


Fig. 7: Interpolation with e = 0.04 


(a) t = 0.2 


(d) t = 0.8 
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of OMT taking a prominent role, see e.g., ff30l . [f43ll . [JTT|. [[241, 0511 • In this, the displacement interpolation 
path for one-time densities ([4]), 

Ht(dx) = ((1 - t)x + fT(x))jj yu 0 (dx), for t G [0,1], 

requires computing the optimal transport map T which in itself is a challenging task. More recently, it was 
noted that the OMT problem with quadratic cost is a limiting form of the stochastic control problem to 
steer a controlled diffusion between two end-point marginals with minimum energy PT1 . P2l . If28l . ||29l , 
m, urn The latter is equivalent to the problem posed by Erwin Schrodinger in 1931, the Schrodinger 
bridge problem, namely to identify the most likely flow of particles between two empirical one-time 
marginals. The connection between OMT and Schrodinger bridges has led to a fast developing circle of 
ideas with important implications in stochastic control and many other fields. In particular, the problem 
to steer deterministic systems with random initial and terminal conditions is akin to OMT and can be 
solved with similar methods lfl3l . Equally important, however, is the reverse implication: techniques from 
the theory of Schrodinger bridges can be used to solve the OMT as the entropic interpolation 

Ht(dx) = (p(t,x)ip(t,x)dx, for t G [0,1], 

with cp(t,x), ip(t,x ) obtained by solving the Schrodinger system ( fTO] ) approximates the displacement 
interpolation as the power of the stochastic excitation e (see Theorem ([3])) tends to zero iflOl . Ifl3l . 

The purpose of the present work has been to draw attention to points of contact between OMT, the 
Schrodinger bridge problem, and the Hilbert metric. The Hilbert metric allows an independent direct 
approach to solving the Schrodinger system of equations as it renders a certain key map contractive 
(C in Section [TIT]). Fixed points of this map provide a solution to the Schrodinger bridge problem and 
the contractiveness ensures linear convergence. We expect that further numerical studies and specialized 
software will permit applying the approach to problems of substantially large scale. 


VII. Appendix 


[Proof of Theorem [6] - largely based on Jamison’s 11231 arguments] Let A x C A 2 C d 3 ... be an 
increasing sequence of compact Borel sets and Bx C B 2 C B 3 ... be an increasing sequence of compact 
Borel sets such that 


Ho(A k ) 

Ati(-Bfc) 


1 

(k + l) 2 ’ 
1 

(AH- l) 2 ' 


Let C k = A k x B k , and let Eg = E fl A k = {E D A k \ E G E}, Ej = E D B k , then Eg x E^ is the class 
of Borel subsets of C k . On C k , by Theorem [2] and Remark [5] for any pair of A k , B k , there exists a finite 
product measure v k on Eg x Ef and a measure n k on E k x E k such that 


7T k (E 0 x B k ) = ^o(Sg), WE 0 G E k 
7r k (A k x E x ) = Hi(Ei), WE 1 G Ej 


and 

d7T n 

— = g on C k . (22) 

The measures 7r fc and v k can be extended to the space M” x M" by setting them equal to 0 on sets disjoint 
from C k . Fix m and construct a set 11,„ of measures in E™ x E™ by restricting {tt} to C rn . The set n m 
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is of course tight since C m is compact. The set II m is also uniformly bounded in total variation norm 
since 

7 T k {C m ) < ir k (A m X B k ) = Ho(Am) = 1 - 7-—ry. (23) 

ym + l) z 

By (extension of) Prokhorov’s theorem |[6|| [Theorem 8.6.2.], the set II m is sequential compact with respect 
to weak topology and therefore has a weakly convergent subsequence. This implies the existence of a 
measure it on E x S whose restriction to E™ x E™ is for each m the weak limit relative to C(C m ) of 
a subsequence in II m . Let n ke denote this subsequence, then f gdir ke —> f gdir for any continuous g on 
M n x with compact support. We next show f gdn ke —> f gdn for any bounded continuous g. It is 
enough to show n is bounded. Combine ( |23| ) and 


we obtain 


and 


It follows that 


vr \C m ) > n k (A k xB m )+n k (A m xB k )-n k (C k ) 

1 i i 

1_ (m + 1) 2 +1_ (m + 1) 2 (k + l) 2 

2 

> 1 


1 - 


(m + l ) 2 
2 


(m + l ) 2 


< 7r(C m ) < 1 - 


' K {C m \C m _ i) < 


(m + l ) 2 ’ 


2 1 
oc 


(m + 1) 2 m 2 m 2 

OO OO 

< U Cu ( U C m\Cm- 0 ) < OO, 


m =1 


m =2 


and therefore 7 r is bounded (obviously 7 r(M n \ (J ™ =l C m ) = 0 since 7r fc (R n \Um=i Cm) = 0). Thus, the 
sequence {n ke } weakly converges to tt and n is a probability measure (total mass is 1). Let //q, g k be the 
marginals of n k , then it is clear {/ip} and {//|(} weakly converge to /i 0 and //1 respectively. On the other 
hand, {/Xq} and { g k } weakly converge to the marginals of n since {n k } weakly converge to it. Therefore 
fi 0 and Hi must be the marginals of n. 


So far we establish 2) and the first half of 1). We now show 3) and the second half of 1). Fix m and let 
/ be a bounded continuous function with support in C m - The restriction of f/q to C rn is also a bounded 


continuous function, so by (22) 


fdv kl = / ( f/q)dir kt ->• / (// q)dir 


ke 


(24) 


as i —>■ oo. This shows that the restriction of v k ‘ to C rn converges weakly to a finite measure //„, (in fact 
du m = dn/q on C m ), from which we can construction a cr-finite measure v whose restriction to C m is 
u m , m — 1,2,.... In view of ( [24] ) we conclude that du/dn = 1/q and dn/du = q follows. Each u m is a 
product measure based on the fact v k is a product measure. Therefore v must be a product measure. This 
completes the proof of the existence of tt and v that satisfy all the 3 properties. 

To establish that v and tt are unique, assume that v’ is a product measure and it ' a probability measure 
for which 1),2) and 3) hold. Then 


Ho(E) = 


qdv = 


qdu' 


' Exl" 


' ExR n 


(25) 
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and 


Hi (E) = / qdv = / qdv' 

Jw n xE Jw n xE 


for each E <G E. Suppose v = v 0 x z/ 1; v' = v' Q x v\. Let /i 0 (ai) = / q{x,y)vi{dy),x G M n , /ii(t/) = 
f q(x,y)v 0 (dx),y G M”, and let h(x,y) = h 0 (x)h 1 (y), (x,y) G M n x M. n . Let k 0 ,ki and k be similarly 
defined but with i/ 0 , v[ replacing v 0 , v\ respectively. Let g 0 and g\ be bounded E-measurable functions 
on M n , and let g(x,y ) = g 0 (x)gi(y), (x,y) Gl"x W 1 . By virtue of ( |25| ), we have / godyo = f g 0 h 0 dv 0 
and f gidgi = f g\h | dig. Multiplying corresponding sides of these two equations, we have 

*) = /,**, 

Since h is strictly positive, we can rewrite this as 

/ ( g/h)d(no x Hi) = / gdu. 


Similarly 


(g/k)d(n o x hi) = / 


(26) 


(27) 


The definition of E x E as the cr-field generated by the field of finite disjoint unions of rectangles E x F 


with E, F G £ ensures that ( |26| ) and ( |27j ) hold for all non-negative E x E-measurable functions g. Let cr 0 
and o’i be bounded E-measurable functions on M n , and let cr(x,y) = cr 0 (x) + <Ji(y), (x,y) G M n x M n . 
Then 

j ad(fi 0 x Hi) — j &odHo + J cridni 

= / aqdv = / ( aq/h)d(no x Hi) 


by virtue of ( |26l ) and ([27]). Using z/' instead of u, we obtain similarly J crd(/i 0 x //i) = J ( aq/k)d(Ho x /ii). 
Therefore we conclude that 


{aq/h)d(no X Hi) = / ( crq/k)d(Ho X Hi)- 


(28) 


Since a is bounded and qdv is a probability measure, the common value of the two sides of ( [28] ) is finite. 
Thus we get 

/ aqil/h — l/k)d(Ho x Hi) = 0- 


In particular, this last equation holds when we take a specific a as 

l/h 0 (x) l/fci(y) 


(j(x,y) = 


from which we deduce 


1 /h 0 (x) + l/k Q (x) l/hi(y) + l/h(y) 

j q(l/h — 1 /k) 2 d(no X Hi) = 0. 


(29) 

(30) 


Since q > 0, we must have h = k on the support of Ho x Hi- It now follows from ( |26[ ) and ( [27j ) that 
v = v ', and it follows from dn/dv = dx'/di/ that x = x'. This completes the proof. 
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